Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for OpenEphemeral bytecode #768

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

diegoreis42
Copy link
Contributor

@diegoreis42 diegoreis42 commented Jan 23, 2025

First step to close #741. It allows creating temporary table operations and improve flexibility in handling in-memory data. Since is my first time working on core code, every feedback is welcome. @jussisaurio @pereman2 @PThorpe92 @penberg

@penberg
Copy link
Collaborator

penberg commented Jan 26, 2025

Hey @diegoreis42, looks very sensible to me!

core/vdbe/mod.rs Outdated
@@ -260,12 +269,19 @@ impl ProgramState {
}

macro_rules! must_be_btree_cursor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro will probably have to be renamed?

Copy link
Contributor Author

@diegoreis42 diegoreis42 Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must_be_general_cursor maybe?

@diegoreis42 diegoreis42 marked this pull request as ready for review January 29, 2025 14:50
core/vdbe/mod.rs Outdated
cursors
.get_mut(*cursor_id)
.unwrap()
.replace(Cursor::new_ephemeral(EphemeralCursor::new()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure what should be returned here.

@diegoreis42 diegoreis42 changed the title wip: Add support for OpenEphemeral bytecode Add support for OpenEphemeral bytecode Jan 31, 2025
use crate::{types::OwnedValue, Result};

pub struct EphemeralCursor {
table: Rc<RefCell<EphemeralTable>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be a Rc<RefCell<T>> right? Just table: EphemeralTable

}
}
SeekKey::IndexKey(index_key) => {
// Seek by index key (ignoring row ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support the key being either a rowid or an arbitrary key. From SQLite docs:

Open a new cursor P1 to a transient table. The cursor is always opened read/write even if the main database is read-only. The ephemeral table is deleted automatically when the cursor is closed.
If the cursor P1 is already opened on an ephemeral table, the table is cleared (all content is erased).

P2 is the number of columns in the ephemeral table. The cursor points to a BTree table if P4==0 and to a BTree index if P4 is not 0. If P4 is not NULL, it points to a KeyInfo structure that defines the format of keys in the index.

So I think EphemeralTable should be something like

pub struct EphemeralTable<Key: Ord> {
    pub rows: BTreeMap<Key, Vec<OwnedValue>>,
    pub next_rowid: u64,                      // generate rowids
    pub columns: Vec<Column>,                 // columns
}

OR define it as an enum:

enum Ephemeral {
  Table(EphemeralTable)
  Index(EphemeralIndex)
}

and have separate definitions.

Either way, searching with an index key should be O(log n) in an ephemeral index, instead of O(n) as here.

@diegoreis42 diegoreis42 force-pushed the feat/open-transient branch 9 times, most recently from baf3a0c to e5a4212 Compare February 6, 2025 02:14
@diegoreis42 diegoreis42 mentioned this pull request Feb 6, 2025
Take the logical AND of the values in registers P1 and P2 and write the result into register P3. If either P1 or P2 is 0 (false) then the result is 0 even if the other input is NULL. A NULL and true or two NULLs give a NULL output.
@redixhumayun
Copy link
Contributor

@diegoreis42 any idea when you plan to merge this in? Having this for DROP TABLE in #897 would be super helpful.

Copy link
Collaborator

@jussisaurio jussisaurio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I think there's still a few issues that need addressing in some way. I left some suggestions which may or may not make sense :)

// No matching record found
self.rowid = None;
self.current = None;
self.null_flag = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null_flag is an outer join specific concept and shouldn't be set here. it is set on when a LEFT JOIN condition is not met. imagine the following:

CREATE TABLE a(foo);
CREATE TABLE b(foo);
INSERT INTO a VALUES (1);
INSERT INTO b VALUES (2);

SELECT a.foo, b.foo FROM a LEFT JOIN b on a.foo = b.foo;
1|NULL

the null flag is set on for the cursor of table b in the example so that when you call Rowid or Column on that cursor, you always get NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I didn't knew what is the purpose of null_flag, so I just copied, this makes much more sense.

unreachable!("index seek key should be a record");
};

let search_key = index_key.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably make the key type of EphemeralIndex to be just Vec<OwnedValue> so we dont have to do so much cloning here and below, we can just pass a slice to the .range() calls like rows.range(index_key.values[..last]..)


let rows = &index.rows;
let mut range = match op {
SeekOp::EQ => rows.range(trimmed_key.clone()..=trimmed_key.clone()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EQ doesnt need to use range, right?

};

// this is an obvious makeshift but I didn't find any better way to do it
if matches!(op, SeekOp::GT) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use rows.range(Bound::Excluded(val), Bound::Unbounded) (https://doc.rust-lang.org/beta/std/ops/enum.Bound.html) to avoid this

if let Some(record) = range.next() {
let rowid = match record.values.last() {
Some(OwnedValue::Integer(rowid)) => *rowid as u64,
_ => unreachable!("index records should have an integer rowid"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...as the last element"

) -> Result<CursorResult<()>> {
match &mut self.source {
Ephemeral::Table(table) => {
let rowid = if moved_before {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the significance of moved_before here? struggling a bit to see how this would be used.


index
.rows
.retain(|r| !r.values.contains(&OwnedValue::Integer(*rowid)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several problems with this:

  • This scans the entire btree on every insert
  • For every element, it scans all the values
  • It might even delete false positives, what if there's an unrelated integer element that happens to match the rowid?

What are we trying to do here? Isn't record already supposed to contain the rowid as the last element, and insertion should fail if the same full key already exists?


self.rowid = Some(rowid);
self.current = Some(record.clone());
self.null_flag = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i already mentioned about null_flag but just highlighting this as well

}
} else if let Some(current_rowid) = self.rowid {
if let Some((&next_rowid, row_data)) =
table.rows.range((current_rowid + 1)..).next()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that we're doing an O(log n) seek on every call to next() which makes a table scan of the ephemeral table have a complexity of O(n log n).

I'm not sure what would be the best way to implement this actually, perhaps we need to store the Range and do something like this:

struct EphemeralTable<'a> {
    set: BTreeSet<usize>,
    iterator: RefCell<Option<std::collections::btree_set::Range<'a, usize>>>,
    current: RefCell<Option<usize>>,
}

impl<'a> EphemeralTable<'a> {
    fn new() -> Self {
        Self { set: BTreeSet::new(), iterator: RefCell::new(None), current: RefCell::new(None) }
    }

    fn next(&'a self) {
        if self.iterator.borrow().is_none() {
            *self.iterator.borrow_mut() = Some(self.set.range(..));
        }
        *self.current.borrow_mut() = self.iterator.borrow_mut().as_mut().unwrap().next().cloned();
    }
    
    fn prev(&'a self) {
        if self.iterator.borrow().is_none() {
            *self.iterator.borrow_mut() = Some(self.set.range(..));
        }
        *self.current.borrow_mut() = self.iterator.borrow_mut().as_mut().unwrap().next_back().cloned();
    }

    fn seek(&'a self, value: usize) {
        let range = self.set.range(value..);
        self.iterator.borrow_mut().replace(range);
        *self.current.borrow_mut() = self.iterator.borrow_mut().as_mut().unwrap().next().cloned();
    }

    fn insert(&mut self, value: usize) {
        self.set.insert(value);
    }

    fn current(&self) -> Option<usize> {
        self.current.borrow().clone()
    }
}

@@ -316,6 +340,11 @@ impl VTabOpaqueCursor {
}
}

pub enum GeneralCursor<'a> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh both of these are BTree cursors, one of them just uses an in memory BTreeMap/BTreeSet and is not persistent. wonder if we should call this

pub enum GeneralBTreeCursor {
  Persistent(&'a mut BTreeCursor),
  Ephemeral(&'a mut EphemeralCursor)
}

also, both the persistent and ephemeral cursors share a lot of the same methods and now there's a lot of pattern matching duplication in various opcodes, so wondering if it would make sense to just do it like this:

impl <'a> GeneralBTreeCursor<'a> {
  fn rewind(...) {
    match self {
       Self::Persistent(p) => c.rewind(...)),
       Self::Ephemeral(e) => e.rewind(...)),
    }
  }
}

@diegoreis42
Copy link
Contributor Author

diegoreis42 commented Feb 11, 2025

@diegoreis42 any idea when you plan to merge this in? Having this for DROP TABLE in #897 would be super helpful.

I'm quite busy at my work this week, so probably I'll address this issue this weekend, but I'll do my best to start sooner.

@redixhumayun
Copy link
Contributor

@diegoreis42 any idea when you plan to merge this in? Having this for DROP TABLE in #897 would be super helpful.

I'm quite busy in at my work this week, so probably I'll address this issue this weekend, but I'll do my best to start sooner.

No worries! I can always raise another PR to augment DROP TABLE once this is merged in 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Count distinct is not supported
5 participants